Skip to content

Fix PrepareResourceClaim to read the rdmadev from sysfs path as well#57

Open
neaggarwMS wants to merge 1 commit intokubernetes-sigs:mainfrom
neaggarwMS:main
Open

Fix PrepareResourceClaim to read the rdmadev from sysfs path as well#57
neaggarwMS wants to merge 1 commit intokubernetes-sigs:mainfrom
neaggarwMS:main

Conversation

@neaggarwMS
Copy link

his PR fixes the PrepareResourceClaim function to read the RDMA device name from sysfs path as a fallback when rdmamap fails to detect the device.

Changes:

Testing:

  • Unit tests pass
  • Tested on AKS cluster with Azure RDMA NIC (SKU: Standard_ND96isr_H100_v5)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: neaggarwMS
Once this PR has been reviewed and has the lgtm label, please assign michaelasp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@linux-foundation-easycla
Copy link

CLA Not Signed

@k8s-ci-robot
Copy link
Contributor

Welcome @neaggarwMS!

It looks like this is your first PR to kubernetes-sigs/dranet 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/dranet has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 28, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @neaggarwMS. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2026
Comment on lines +88 to +109

//Fallback to sysfs check if rdmamap fails. This is particularly related to a known
// issue to detect RDMA devices for certain Mellanox NICs
// https://github.com/Mellanox/rdmamap/issues/15

rdmaDir := filepath.Join("/sys/class/net", ifName, "device/infiniband")

entries, err := os.ReadDir(rdmaDir)
if err != nil {
return "", fmt.Errorf("no RDMA device for %s: %w", ifName, err)
}

for _, entry := range entries {
if entry.IsDir() {
return entry.Name(), nil // Return first RDMA device found (e.g., "mlx5_0")
}
}

return "", fmt.Errorf("no RDMA device found for %s", ifName)
}

return rdmaDev, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider creating this as function in pkg/inventory/sysfs.go as all sysnet operations added in that file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Extending this further, can we please refactor the existing function

func hasRDMADeviceInSysfs(ifName string) bool {
into two functions, similar to the upstream implementation of IsRDmaDeviceForNetdevice and GetRdmaDeviceForNetdevice, where IsRDmaDeviceForNetdevice is a simple wrapper over GetRdmaDeviceForNetdevice (Ref. https://github.com/Mellanox/rdmamap/blob/37bd11cc4c57da931b7b117f829fb663d46ce480/rdma_map.go#L348-L368

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@gauravkghildiyal gauravkghildiyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neaggarwMS. Mostly looks good.

(Please take a look at the presubmits to sign the CLA)

/ok-to-test

Comment on lines +88 to +109

//Fallback to sysfs check if rdmamap fails. This is particularly related to a known
// issue to detect RDMA devices for certain Mellanox NICs
// https://github.com/Mellanox/rdmamap/issues/15

rdmaDir := filepath.Join("/sys/class/net", ifName, "device/infiniband")

entries, err := os.ReadDir(rdmaDir)
if err != nil {
return "", fmt.Errorf("no RDMA device for %s: %w", ifName, err)
}

for _, entry := range entries {
if entry.IsDir() {
return entry.Name(), nil // Return first RDMA device found (e.g., "mlx5_0")
}
}

return "", fmt.Errorf("no RDMA device found for %s", ifName)
}

return rdmaDev, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Extending this further, can we please refactor the existing function

func hasRDMADeviceInSysfs(ifName string) bool {
into two functions, similar to the upstream implementation of IsRDmaDeviceForNetdevice and GetRdmaDeviceForNetdevice, where IsRDmaDeviceForNetdevice is a simple wrapper over GetRdmaDeviceForNetdevice (Ref. https://github.com/Mellanox/rdmamap/blob/37bd11cc4c57da931b7b117f829fb663d46ce480/rdma_map.go#L348-L368

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 28, 2026
@gauravkghildiyal
Copy link
Member

Hey @neaggarwMS, checking in to see if you still need this change and will be able to accommodate the minor proposed comment.

@aojea
Copy link
Contributor

aojea commented Feb 5, 2026

ping @neaggarwMS , please check last comment and also sign the CLA, this seems almost ready to go

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants